MDP & Q-Learning & SARSA

In this lab, we will introduce the conception of Markov Decision Process(MDP) and two solution algorithms, and then we will introduce the Q-Learning and SARSA algorithm, finally we will use the Q-learning algorithm to train an agent to play "Flappy Bird" game.

Comparison between Q-Learning and SARSA

Comparison of Lifetime

Type Lifetime
Q-Learning
SARSA

Actually, the trend of lifetime and reward are alike. Because the reward of flappy bird is how long the bird survived. | Type | Reward | | ------- | ---------- | | Q-Learning | | | SARSA | |

There w